consecutive frame
- North America > United States > New Jersey (0.04)
- North America > United States > California > Merced County > Merced (0.04)
- North America > Canada (0.04)
- Asia > China > Beijing > Beijing (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- Asia > India > Tamil Nadu > Chennai (0.04)
RaFD: Flow-Guided Radar Detection for Robust Autonomous Driving
Yang, Shuocheng, Xu, Zikun, Wang, Jiahao, Nawaz, Shahid, Wang, Jianqiang, Xu, Shaobing
Radar has shown strong potential for robust perception in autonomous driving; however, raw radar images are frequently degraded by noise and "ghost" artifacts, making object detection based solely on semantic features highly challenging. To address this limitation, we introduce RaFD, a radar-based object detection framework that estimates inter-frame bird's-eye-view (BEV) flow and leverages the resulting geometric cues to enhance detection accuracy. Specifically, we design a supervised flow estimation auxiliary task that is jointly trained with the detection network. The estimated flow is further utilized to guide feature propagation from the previous frame to the current one. Our flow-guided, radar-only detector achieves achieves state-of-the-art performance on the RADIATE dataset, underscoring the importance of incorporating geometric information to effectively interpret radar signals, which are inherently ambiguous in semantics.
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.05)
- North America > Canada > Quebec > Montreal (0.05)
- Europe > Switzerland (0.05)
- (2 more...)
- Transportation > Ground > Road (0.63)
- Information Technology > Robotics & Automation (0.63)
- Automobiles & Trucks (0.63)
- Asia > China > Tianjin Province > Tianjin (0.05)
- Asia > China > Liaoning Province > Dalian (0.05)
Lightweight Multi-Frame Integration for Robust YOLO Object Detection in Videos
Quan, Yitong, Kiefer, Benjamin, Messmer, Martin, Zell, Andreas
Modern image-based object detection models, such as YOLOv7, primarily process individual frames independently, thus ignoring valuable temporal context naturally present in videos. Meanwhile, existing video-based detection methods often introduce complex temporal modules, significantly increasing model size and computational complexity. In practical applications such as surveillance and autonomous driving, transient challenges including motion blur, occlusions, and abrupt appearance changes can severely degrade single-frame detection performance. To address these issues, we propose a straightforward yet highly effective strategy: stacking multiple consecutive frames as input to a YOLO-based detector while supervising only the output corresponding to a single target frame. This approach leverages temporal information with minimal modifications to existing architectures, preserving simplicity, computational efficiency, and real-time inference capability. Extensive experiments on the challenging MOT20Det and our BOAT360 datasets demonstrate that our method improves detection robustness, especially for lightweight models, effectively narrowing the gap between compact and heavy detection networks. Additionally, we contribute the BOAT360 benchmark dataset, comprising annotated fisheye video sequences captured from a boat, to support future research in multi-frame video object detection in challenging real-world scenarios.
- Europe > Switzerland > Zürich > Zürich (0.14)
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
- Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.04)
- Information Technology (0.49)
- Transportation > Ground > Road (0.34)
- Automobiles & Trucks (0.34)
MARVIS: Motion & Geometry Aware Real and Virtual Image Segmentation
Wu, Jiayi, Lin, Xiaomin, Negahdaripour, Shahriar, Fermüller, Cornelia, Aloimonos, Yiannis
Tasks such as autonomous navigation, 3D reconstruction, and object recognition near the water surfaces are crucial in marine robotics applications. However, challenges arise due to dynamic disturbances, e.g., light reflections and refraction from the random air-water interface, irregular liquid flow, and similar factors, which can lead to potential failures in perception and navigation systems. Traditional computer vision algorithms struggle to differentiate between real and virtual image regions, significantly complicating tasks. A virtual image region is an apparent representation formed by the redirection of light rays, typically through reflection or refraction, creating the illusion of an object's presence without its actual physical location. This work proposes a novel approach for segmentation on real and virtual image regions, exploiting synthetic images combined with domain-invariant information, a Motion Entropy Kernel, and Epipolar Geometric Consistency. Our segmentation network does not need to be re-trained if the domain changes. We show this by deploying the same segmentation network in two different domains: simulation and the real world. By creating realistic synthetic images that mimic the complexities of the water surface, we provide fine-grained training data for our network (MARVIS) to discern between real and virtual images effectively. By motion & geometry-aware design choices and through comprehensive experimental analysis, we achieve state-of-the-art real-virtual image segmentation performance in unseen real world domain, achieving an IoU over 78% and a F1-Score over 86% while ensuring a small computational footprint. MARVIS offers over 43 FPS (8 FPS) inference rates on a single GPU (CPU core). Our code and dataset are available here https://github.com/jiayi-wu-umd/MARVIS.
- North America > United States > Maryland > Prince George's County > College Park (0.14)
- North America > United States > Florida > Miami-Dade County > Coral Gables (0.04)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Natural Language (0.88)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)